1The University of Hong Kong 2Zhejiang University 3Tencent 4Texas A&M University
In this paper, we present a generalizable model-free 6-DoF object pose estimator called Gen6D. Existing generalizable pose estimators either need the high-quality object models or require additional depth maps or object masks in test time, which significantly limits their application scope. In contrast, our pose estimator only requires some posed images of the unseen object and is able to accurately predict poses of the object in arbitrary environments. Gen6D consists of an object detector, a viewpoint selector and a pose refiner, all of which do not require the 3D object model and can generalize to unseen objects. Experiments show that Gen6D achieves state-of-the-art results on two model-free datasets: the MOPED dataset and a new GenMOP dataset collected by us. In addition, on the LINEMOD dataset, Gen6D achieves competitive results compared with instance-specific pose estimators.
Both DeepIM and Gen6D
are trained on the same training dataset and generalize to these unseen objects.
Gen6D generalizes better than DeepIM due to the utilization of a feature volume-based refiner.
PVNet
is trained on the object using the reference images (about 200) which are not enough to train a PVNet for accurate pose estimation.
A simple AR application: With the known poses, we are able to render an adorable Dodoco to replace the cute Lulu Piggy. Gen6D does not require the object model nor the object mask. By simply capturing reference images of an unseen object by cellphones and recovering the poses of reference images by COLMAP, Gen6D is able to predict the object pose on arbitrary query images. Thus, Gen6D can be easily applied on daily objects for AR/VR applications.
@inproceedings{liu2022gen6d,
title={Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images},
author={Liu, Yuan and Wen, Yilin and Peng, Sida and Lin, Cheng and Long, Xiaoxiao and Komura, Taku and Wang, Wenping},
booktitle={ECCV},
year={2022}
}